0.1 Importing the original dataset

Here we read our data and store into the variable df. The dataset was taken from the original project available here

df <- read.csv2('https://raw.githubusercontent.com/mcasky16/RR_Final_Project/main/student_feedback.csv', sep = ',')

COLUMN NAMES Taking a look at our column names.

names(df)
##  [1] "X"                                                   
##  [2] "Student.ID"                                          
##  [3] "Well.versed.with.the.subject"                        
##  [4] "Explains.concepts.in.an.understandable.way"          
##  [5] "Use.of.presentations"                                
##  [6] "Degree.of.difficulty.of.assignments"                 
##  [7] "Solves.doubts.willingly"                             
##  [8] "Structuring.of.the.course"                           
##  [9] "Provides.support.for.students.going.above.and.beyond"
## [10] "Course.recommendation.based.on.relevance"

0.2 POTENTIAL QUESTIONS TO BE ANSWERED

How is “course recommendation based on relevance distributed” as density plot?

Average use of presentation

CORRELATION between solves doubt AND explain concept in an understandable way

Bar chart student well versed with the topic

On the scale to one to 10 how much difficult assignment is 6- scale and 7-scale

Bar chart rating for teacher that Solves doubt willingly

0.3 REMOVING MISSING VALUES TO CLEAN THE DATA AND AVOID ERRORS

df_nona implies df with no null values

df_nona = df%>%drop_na()

0.4 How is “course recommendation based on relevance distributed” as density plot?

From the plot below it could be seen that, most of the students chose to recommend the course to others since the plot is more densed at the point of the scale reading from 5.0 to 10 which is above the average of the scale. However, this difference is not that much across the scale.

## Warning: Ignoring unknown parameters: bins

0.5 Average use of presentation

df_nona %>%
summarize(average_use_ofpresentation= round(mean(Use.of.presentations)))
##   average_use_ofpresentation
## 1                          6

The average use of presentation was given to be 6 on the scale from the calculation above. This tells that, more students prefer courses that has some sort of presentation by the instructors.

0.6 On the scale to one to 10 ho much difficult assignment is 6- scale and 7-scale?

In this plot, it was checked from the scale 6 to scale 7, how the students are distibuted across the scale. It can be seen that there is quiet a large number of students who are rating between 6 to 6.5 than from 6.5 to 7. This explains that, the students are more likely to rate difficulty in assignment on the scle of 6 than on 7.

df_nona %>%
  filter(Degree.of.difficulty.of.assignments %in% c('6', '7'))%>%
  ggplot() + 
  geom_density(aes(x = Degree.of.difficulty.of.assignments , fill = Degree.of.difficulty.of.assignments , alpha = 0.1))+
  labs(title = 'Degree.of.difficulty.of.assignments',
       subtitle= 'Mumbai student survey' , x= 'Degree.of.difficulty.of.assignments',y= 'Frequency') +
  ggthemes::theme_stata()

0.7 Bar chart student well versed with the topic

This plot shows how students in the original data well understood the topic that were conducted by their favorite lecturers. In this plot below, it could be seen that, students understanding of the course ranged from 5 to 10 on the scale and majority of the students indicating 9 as their level of understanding which is very high.

df_nona %>%
  group_by(Well.versed.with.the.subject)%>%
  count(sort =TRUE) %>%
  head(10)%>%
  ggplot() +
  geom_col(aes(x= reorder(Well.versed.with.the.subject , n) , y = n), fill = "Pink")  + 
  geom_label(aes(x= reorder(Well.versed.with.the.subject , n) , y = n , label=n))+
  coord_flip() +
  labs(title = 'no. of student who are well versed ',
        x= 'scale',y= 'Frequency')+
  ggthemes::theme_economist_white()-> scale.... 
plotly:: ggplotly(scale....)

0.8 Bar chart rating for teacher that Solves doubt willingly

A bar plot representation was used to answer the question regarding how teachers are willing to solve students doubt. From the plot it could be seen that, majority of the students rated 6. But we can tell from the heights of the bars that, the differences in the rating were not that huge so one can conclude the respondents were almost evenly spread.

df_nona %>%
  group_by(Solves.doubts.willingly)%>%
  count(sort =TRUE)%>%
  ggplot() +
  geom_col(aes(x= reorder(Solves.doubts.willingly, n) , y = n), fill = "purple" )  + 
  geom_label(aes(x= reorder(Solves.doubts.willingly , n) , y = n , label=n))+
 
  ggthemes::theme_gdocs()->gg....Solves.doubts.willingly
plotly:: ggplotly(gg....Solves.doubts.willingly)

0.9 CORRELATION BETWEEN solves doubt AND explain concept in an understandable way

The sample correlation coefficient is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the calculation above, possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward).

df_nona%>%
  select(c('Solves.doubts.willingly' ,'Explains.concepts.in.an.understandable.way')) %>%
  cor()
##                                            Solves.doubts.willingly
## Solves.doubts.willingly                                 1.00000000
## Explains.concepts.in.an.understandable.way             -0.02583881
##                                            Explains.concepts.in.an.understandable.way
## Solves.doubts.willingly                                                   -0.02583881
## Explains.concepts.in.an.understandable.way                                 1.00000000

Correlation less than 0 indicate a negative correlation in our case it is -0.0258, however this correlation is almost insignificant, showing that there is little to no correlation. Below is the corresponding scatter plot. This is consistent with the initial research upon reproduction.

0.10 SCATTER PLOT

Since the points or values are range in absolute integer the scatter plot is not showing good result. This is a 2D scatter plot showing how the lecturer explains concepts in an understandable way versus the lecturer solving doubts willingly. The visuals show that there is a general average or constant relationship so to say on the explanation of concepts in an understanding way and solving doubts willingly. There is however little variance in the data.

df_nona%>%
  select(c('Solves.doubts.willingly' ,'Explains.concepts.in.an.understandable.way')) %>%
  ggplot(aes(x= Solves.doubts.willingly , y = Explains.concepts.in.an.understandable.way))+
  geom_point(color = "#6c094f")+
  geom_smooth(method = lm)+
  labs(title = "correlation Scatter Plot" )

0.11 Importing Dataset from the survey data.

Here we import the data from our Google form survey. The dataframe is named df_2

df_2 <- read.csv2('https://raw.githubusercontent.com/mcasky16/RR_Final_Project/main/RR_survey.csv', sep = ',')

REMOVING MISSING VALUES TO CLEAN THE DATA AND AVOID ERRORS

df_2_nona implies df_2 with no null values

df_2_nona = df_2%>%drop_na()

0.12 How is “course recommendation based on relevance distributed” as density plot?

Unlike the original data, the density plot shows that the students woould not recommend this course to others based on relevance. hence the plot is more dense on the left side. This could be to the fact that, students take up this course based on other reasons rather than relevance.

## Warning: Ignoring unknown parameters: bins

0.13 Average use of presentation

df_2_nona %>%
summarize(average_use_ofpresentation= round(mean(Use.of.presentations)))
##   average_use_ofpresentation
## 1                          5

The average use of presentation was given to be 5 on the scale from the calculation above. This tells that, more students prefer courses that has some sort of presentation by the instructors. This is more or less a reflection the original dataset.

0.14 On the scale to one to 10 ho much difficult assignment is 5- scale and 7-scale on Survey data.

In this plot, it was checked from the scale 6 to scale 7, how the students are distibuted across the scale. It can be seen that there is quiet a large number of students who are rating between 6.5 to 7 than from 6 to 6.5. This explains that, the students are more likely to rate difficulty in assignment on the scale of 7 than on 6.

df_2_nona %>%
  filter(Degree.of.difficulty.of.assignments %in% c('6', '7'))%>%
  ggplot() + 
  geom_density(aes(x = Degree.of.difficulty.of.assignments , fill = Degree.of.difficulty.of.assignments , alpha = 0.1))+
  labs(title = 'Degree.of.difficulty.of.assignments',
       subtitle= 'Survey data set' , x= 'Degree.of.difficulty.of.assignments',y= 'Frequency') +
  ggthemes::theme_stata()

0.15 Bar chart student well versed with the topic on Survey Data

Here we checked for the frequencies of the respondents from the survey data and compare how well they understand the course from their favorite lecturers.

From our plot below, unlike the results from the main data, the respondents were rather not well versed or understand the course from their favorite lecturers since majority of the respondents rather chose not to be well versed with the subjects. However, quite a number of them also chose a scale above 5, which is a reflection of the main original data.

df_2_nona %>%
  group_by(Well.versed.with.the.subject)%>%
  count(sort =TRUE) %>%
  head(10)%>%
  ggplot() +
  geom_col(aes(x= reorder(Well.versed.with.the.subject , n) , y = n), fill = "Green")  + 
  geom_label(aes(x= reorder(Well.versed.with.the.subject , n) , y = n , label=n))+
  coord_flip() +
  labs(title = 'no. of student who are well versed ',
        x= 'scale',y= 'Frequency')+
  ggthemes::theme_economist_white()-> scale.... 
plotly:: ggplotly(scale....)

0.16 Bar chart rating for teacher that Solves doubt willingly on survey data

The results in this plot look much different from the original data, but there also seem to be a reflection of the main dataset in the middle belt of the plot when the respondents were evenly spread. But comparing the difference between the scale with least respondents and that of the scale with much responses we can tell the data is very much varying.

df_2_nona %>%
  group_by(Solves.doubts.willingly)%>%
  count(sort =TRUE)%>%
  ggplot() +
  geom_col(aes(x= reorder(Solves.doubts.willingly, n) , y = n), fill = "Orange" )  + 
  geom_label(aes(x= reorder(Solves.doubts.willingly , n) , y = n , label=n))+
 
  ggthemes::theme_gdocs()->gg....Solves.doubts.willingly
plotly:: ggplotly(gg....Solves.doubts.willingly)

0.17 CORRELATION BETWEEN solves doubt AND explain concept in an understandable way for Survey Data

df_2_nona%>%
  select(c('Solves.doubts.willingly' ,'Explains.concepts.in.an.understandable.way')) %>%
  cor()
##                                            Solves.doubts.willingly
## Solves.doubts.willingly                                 1.00000000
## Explains.concepts.in.an.understandable.way             -0.05688994
##                                            Explains.concepts.in.an.understandable.way
## Solves.doubts.willingly                                                   -0.05688994
## Explains.concepts.in.an.understandable.way                                 1.00000000

The sample correlation coefficient is a measure of the closeness of association of the points in a scatter plot to a linear regression line based on those points, as in the calculation above, possible values of the correlation coefficient range from -1 to +1, with -1 indicating a perfectly linear negative, i.e., inverse, correlation (sloping downward) and +1 indicating a perfectly linear positive correlation (sloping upward). Correlation less than 0 indicate a negative correlation in our case it is -0.0569. Below is the corresponding scatter plot.

0.18 SCATTER PLOT for Survey Data

Since the points or values are range in absolute integer the scatter plot is not showing good result. This is a 2D scatter plot showing how the lecturer explains concepts in an understandable way versus the lecturer solving doubts willingly. The visuals show that there is a general average or constant relationship so to say on the explanation of concepts in an understanding way and solving doubts willingly. Compared to the initial study, there is however more variance in the data. We can however notice the difference in the distribution of the data between the original and the survey data.

df_2_nona%>%
  select(c('Solves.doubts.willingly' ,'Explains.concepts.in.an.understandable.way')) %>%
  ggplot(aes(x= Solves.doubts.willingly , y = Explains.concepts.in.an.understandable.way))+
  geom_point(color = "#6c67f8")+
  geom_smooth(method = lm)+
  labs(title = "correlation Scatter Plot" )

## Conclusions

These are our final conclusions of the study and what has been reproduced.

  • We conclude that students tends to more versed in topics conducted by their favorite instructors in the case of the original data set and but not same can be said about the survey data

  • Most students would recommend the course base on relevance in the original data set but not in the case of the survey data.

  • Students from both data set almost had the same thoughts regarding the difficulty in assignments judging from the nature of the curve.

0.19 Limitations

As a limitation, the survey data was too small. And also the location from which both data has been collected may be reason for the difference in results.